feat(webapp): mollifier API mutations on buffered runs#3756
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🦋 Changeset detectedLatest commit: 8d7ebee The changes in this PR will be included in the next version bump. This PR includes changesets to release 32 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
1838229 to
af0aeeb
Compare
b8ead31 to
109fbd7
Compare
af0aeeb to
857bba3
Compare
109fbd7 to
c7a66bd
Compare
857bba3 to
21babc8
Compare
094d006 to
9914976
Compare
0919f7a to
f36c576
Compare
f4b6064 to
0547ba9
Compare
c8ab214 to
047b240
Compare
0547ba9 to
0708ce5
Compare
047b240 to
e57bc5e
Compare
0708ce5 to
396552e
Compare
396552e to
eb2a777
Compare
f4131eb to
d8f6cf7
Compare
e0a57d9 to
b3d188c
Compare
d8f6cf7 to
796a2c0
Compare
b3d188c to
c970692
Compare
796a2c0 to
b139391
Compare
c970692 to
ba084d8
Compare
b139391 to
d153042
Compare
ba084d8 to
eb520ef
Compare
fe21821 to
4d5d0e0
Compare
63ae447 to
0c6b510
Compare
…llowups Four Devin-flagged issues on PR #3756: * Cross-env auth gate on the buffer mutation path (#2). mutateWithFallback and applyMetadataMutationToBufferedRun now verify entry.envId/orgId match the caller's environmentId/organizationId before any buffer write, so a token authed in env A can't mutate a buffered run in env B by guessing the friendlyId. Mismatches return not_found (no existence leak), mirroring the env scoping the PG path already enforces via Prisma filters. * Unhandled error in routeOperationsToRun (#0). The parent/root op fan-out is documented as best-effort but a Redis throw used to 500 the request even though the primary mutation already landed. The buffer fallback now runs through tryCatch and warns instead of throwing. * Silent no-op when parent metadata routes to an internal id (#1). The PG service accepts internal ids, but the buffer is keyed by friendlyId; passing an internal cuid to the fallback was a silent miss. Made it an intentional skip (with a comment explaining why a buffered child's parent is always materialised already). * BufferedReplayInputSchema strips seedMetadata (#3). Replays from a buffered source were silently losing initial metadata vs PG-sourced replays. Added seedMetadata + seedMetadataType to the schema. Tests added: cross-env + cross-org gate cases on both helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llowups Four Devin-flagged issues on PR #3756: * Cross-env auth gate on the buffer mutation path (#2). mutateWithFallback and applyMetadataMutationToBufferedRun now verify entry.envId/orgId match the caller's environmentId/organizationId before any buffer write, so a token authed in env A can't mutate a buffered run in env B by guessing the friendlyId. Mismatches return not_found (no existence leak), mirroring the env scoping the PG path already enforces via Prisma filters. * Unhandled error in routeOperationsToRun (#0). The parent/root op fan-out is documented as best-effort but a Redis throw used to 500 the request even though the primary mutation already landed. The buffer fallback now runs through tryCatch and warns instead of throwing. * Silent no-op when parent metadata routes to an internal id (#1). The PG service accepts internal ids, but the buffer is keyed by friendlyId; passing an internal cuid to the fallback was a silent miss. Made it an intentional skip (with a comment explaining why a buffered child's parent is always materialised already). * BufferedReplayInputSchema strips seedMetadata (#3). Replays from a buffered source were silently losing initial metadata vs PG-sourced replays. Added seedMetadata + seedMetadataType to the schema. Tests added: cross-env + cross-org gate cases on both helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ionsToRun Follow-up to the Devin review on PR #3756. updateMetadataService.call resolves with undefined (not a throw) when the target run isn't in PG, so the previous `if (!error) return;` treated PG-miss as success and the buffer fallback below was unreachable. Capture the result too and fall through only on PG-applied; PG-miss now proceeds to the buffer fallback alongside the PG-threw path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ionsToRun Follow-up to the Devin review on PR #3756. updateMetadataService.call resolves with undefined (not a throw) when the target run isn't in PG, so the previous `if (!error) return;` treated PG-miss as success and the buffer fallback below was unreachable. Capture the result too and fall through only on PG-applied; PG-miss now proceeds to the buffer fallback alongside the PG-threw path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cancel, replay, reschedule, metadata, tags, and idempotency-key-reset now succeed against a run that's still in the mollifier buffer. Mutations are applied to the buffered snapshot via Lua CAS; the drainer carries the mutation forward when it replays. Stacked on the reads PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- metadata route: drop the \`as unknown as Parameters<...>\` cast on the parent/root operations path. Widen \`routeOperationsToRun\`'s env parameter to \`AuthenticatedEnvironment\` so the service's typed signature carries through; the caller always has the full env in scope. - replay route: validate the buffered fallback against a Zod \`BufferedReplayInputSchema\` covering the fields \`ReplayTaskRunService.call\` actually reads (id, friendlyId, runtimeEnvironmentId, taskIdentifier, payload, payloadType, queue, isTest, traceId, spanId, engine, runTags + nullable concurrencyKey/workerQueue/machinePreset/realtimeStreamsVersion). Schema-fail logs the issue list and 404s rather than passing a half-shaped object into the service. - resetIdempotencyKey: distinguish "PG-empty + buffer-cleared-nothing" (genuine 404) from "PG-empty + buffer-unreachable" (partial outage — 503 with retry hint). The previous behaviour silently returned 404 on outage, hiding the partial failure and leaving a buffered key effectively un-reset. New regression test covers all four branches (PG-hit + buffer-throws, PG-empty + buffer-hit, PG-empty + buffer-clean-miss, PG-empty + buffer-outage, mollifier-disabled). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- metadata route was routing rootOperations to bufferedEntry.parentTaskRunId with a comment claiming PG's nil-coalesce defaults to parent. PG actually defaults to taskRun.id (self), so a buffered grandchild metadata.root.set() was silently mutating the child's metadata instead of the root's. SyntheticRun already carries rootTaskRunFriendlyId from the snapshot — use it, falling back to the run itself (matching PG) when absent. - reschedule route's PG path delegates to RescheduleTaskRunService which enforces `status !== "DELAYED"` and 422s otherwise. The buffer path had no equivalent guard, so a customer could inject delayUntil into the snapshot of an undelayed buffered run and the drainer would materialise it with an unintended delay. Added a pre-fetch through findRunByIdWithMollifierFallback and 422 when the buffered run has no delayUntil. SyntheticRun doesn't carry a "DELAYED" status enum (only QUEUED|FAILED|CANCELED) so the gate reads the snapshot's delayUntil field directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wait-and-bounce loop for mutations racing a mid-drain run polled the PG primary on a fixed 20ms cadence with no jitter — up to ~100 reads per request, synchronized across concurrent waiters, piling load onto the writer exactly when mollifier is engaged to shed it. The drainer writes the canonical PG row BEFORE it acks (sets `materialised`) or fails (deletes the entry), so the buffer entry's own state is an authoritative, already-in-Redis signal for "is the row in PG yet?". Watch that (cheap Redis getEntry) instead, and touch the primary exactly once — for the actual mutation — only after it resolves. Poll gaps now use jittered exponential backoff (20ms → 250ms cap). Drops the per-poll PG timeout race (DEFAULT_PG_TIMEOUT_MS / pgTimeoutMs / findRunInPgWithTimeout), unneeded now that PG is read once rather than in a tight loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove plan-tracking shorthand (Q2/Q3 design, _plans/) from mutations-layer mollifier comments; reword to plain English. Comment-only; no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… metadata fallback errors The tags API skipped MAX_TAGS_PER_RUN enforcement on the buffered path, letting a buffered run exceed the cap the trigger validator applies at creation. Enforce it atomically in the mutateSnapshot Lua: append_tags now accepts an optional maxTags and returns "limit_exceeded" (writing nothing) when the deduped count would overflow. mutateWithFallback gains a symmetric rejectedResponse builder + a "rejected" outcome; the tags route returns 422, matching the PG path. Also stop silently swallowing PG failures in the metadata route's parent/root op fan-out: warn (with targetRunId + error) before the best-effort buffer fallback so a genuine PG outage is observable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oute comments Two Phase labels survived the earlier sweep — "Phase A6" in api.v1.runs.\$runId.metadata.ts (the GET-loader-added comment, mirroring the same pattern in api.v1.runs.\$runParam.attempts.ts on reads), and "Phase B4" in api.v1.runs.\$runParam.replay.ts (referring to where SyntheticRun was extended). Rewritten to plain prose; comment-only, no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n comments Two stragglers from the prior sweep: a "(Q5)" tag in the resetIdempotencyKey buffer-side comment, and a "Phase F" reference in the applyMetadataMutation test's regression note. Comment-only; no behavioural change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llowups Four Devin-flagged issues on PR #3756: * Cross-env auth gate on the buffer mutation path (#2). mutateWithFallback and applyMetadataMutationToBufferedRun now verify entry.envId/orgId match the caller's environmentId/organizationId before any buffer write, so a token authed in env A can't mutate a buffered run in env B by guessing the friendlyId. Mismatches return not_found (no existence leak), mirroring the env scoping the PG path already enforces via Prisma filters. * Unhandled error in routeOperationsToRun (#0). The parent/root op fan-out is documented as best-effort but a Redis throw used to 500 the request even though the primary mutation already landed. The buffer fallback now runs through tryCatch and warns instead of throwing. * Silent no-op when parent metadata routes to an internal id (#1). The PG service accepts internal ids, but the buffer is keyed by friendlyId; passing an internal cuid to the fallback was a silent miss. Made it an intentional skip (with a comment explaining why a buffered child's parent is always materialised already). * BufferedReplayInputSchema strips seedMetadata (#3). Replays from a buffered source were silently losing initial metadata vs PG-sourced replays. Added seedMetadata + seedMetadataType to the schema. Tests added: cross-env + cross-org gate cases on both helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ionsToRun Follow-up to the Devin review on PR #3756. updateMetadataService.call resolves with undefined (not a throw) when the target run isn't in PG, so the previous `if (!error) return;` treated PG-miss as success and the buffer fallback below was unreachable. Capture the result too and fall through only on PG-applied; PG-miss now proceeds to the buffer fallback alongside the PG-threw path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4d5d0e0 to
9fd7957
Compare
427f563 to
279172f
Compare
…both metadata + operations Devin caught a divergence on PR #3756: when a request carries both `body.metadata` and `body.operations`, the PG path branches on `Array.isArray(body.operations)` and applies ops on top of EXISTING metadata (ignoring body.metadata), while the buffer path was merging both — body.metadata replacing first, then operations on top. Concrete divergence with existing {a:1} and request {metadata:{b:2}, operations:[set c=3]}: PG → {a:1,c:3}, buffer → {b:2,c:3}. Restructure to match PG: when operations are present, apply on top of snapshot's existing metadata; when only metadata, full replace; when neither, write existing back. Pin via regression test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…both metadata + operations Devin caught a divergence on PR #3756: when a request carries both `body.metadata` and `body.operations`, the PG path branches on `Array.isArray(body.operations)` and applies ops on top of EXISTING metadata (ignoring body.metadata), while the buffer path was merging both — body.metadata replacing first, then operations on top. Concrete divergence with existing {a:1} and request {metadata:{b:2}, operations:[set c=3]}: PG → {a:1,c:3}, buffer → {b:2,c:3}. Restructure to match PG: when operations are present, apply on top of snapshot's existing metadata; when only metadata, full replace; when neither, write existing back. Pin via regression test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Devin follow-up on PR #3756. The PG-side UpdateMetadataService calls handleMetadataPacket(...this.maximumSize) and throws MetadataTooLargeError (status 413) when the encoded packet exceeds the configured cap (TASK_RUN_METADATA_MAXIMUM_SIZE, default 256 KB). The buffered path just JSON.stringify'd and CAS-wrote with no size check, so a caller could write up to the route's 2 MB content limit on a buffered run that PG would have refused. Added a required `maximumSize` input to applyMetadataMutationToBufferedRun and a new `{ kind: "metadata_too_large", maximumSize, observedSize }` outcome that fires BEFORE the CAS write (don't churn the retry budget on an oversize payload). The metadata route passes env.TASK_RUN_METADATA_MAXIMUM_SIZE and maps the outcome to 413 with the same wording as MetadataTooLargeError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Devin follow-up on PR #3756. The PG-side UpdateMetadataService calls handleMetadataPacket(...this.maximumSize) and throws MetadataTooLargeError (status 413) when the encoded packet exceeds the configured cap (TASK_RUN_METADATA_MAXIMUM_SIZE, default 256 KB). The buffered path just JSON.stringify'd and CAS-wrote with no size check, so a caller could write up to the route's 2 MB content limit on a buffered run that PG would have refused. Added a required `maximumSize` input to applyMetadataMutationToBufferedRun and a new `{ kind: "metadata_too_large", maximumSize, observedSize }` outcome that fires BEFORE the CAS write (don't churn the retry budget on an oversize payload). The metadata route passes env.TASK_RUN_METADATA_MAXIMUM_SIZE and maps the outcome to 413 with the same wording as MetadataTooLargeError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Devin follow-ups on PR #3756. * Tags route synthesisedResponse reported `nonEmptyTags.length` (pre- dedup input count) while the PG path reports `newTags.length` (post- dedup against existing tags). Same API call → different message count across the buffered/materialised boundary. mutateWithFallback now forwards the pre-mutation BufferEntry from its env-auth pre-check into synthesisedResponse + rejectedResponse, so the tags route can dedup against the snapshot's existing tags without an extra Redis round-trip. Same TOCTOU semantics as the PG path; the Lua keeps the actual write atomically deduped regardless. * Replay route's `as unknown as TaskRun` cast bypasses TS for the buffered branch; if ReplayTaskRunService later reads an extra TaskRun field without that field being added to BufferedReplayInputSchema, the buffered branch silently feeds the service `undefined`. Run-time mitigation (safeParse + warn log + 404) already exists; added an inline comment near the cast spelling out the manual sync requirement so future edits notice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ffered runs Devin follow-up on PR #3756. The parent branch of the metadata route's fan-out passed `bufferedEntry.parentTaskRunId` (an internal cuid) where the rest of the route works in friendlyIds. `updateMetadataService.call` does auto-detect internal vs friendly via `runId.startsWith("run_")` so the lookup succeeded for the common case (parent materialised in PG), but the inconsistency surfaced as soon as `readFallback.server.ts` was fixed to derive `parentTaskRunFriendlyId` / `rootTaskRunFriendlyId` from the snapshot's internal IDs via `internalRunIdToFriendlyId` — at that point the parent path was the only consumer still passing the internal id. Switch the parent route to `bufferedEntry.parentTaskRunFriendlyId ?? runId`, mirroring the root path's shape. The `?? runId` matches the PG service's own self-fallback (`taskRun.parentTaskRun?.id ?? taskRun.id`) so a top-level run's parent ops land on itself rather than being silently dropped. Stale comment block about "parentTaskRunId is an internal id" removed — both fields are now friendlyIds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Cancel, replay, reschedule, metadata, tags, and idempotency-key-reset now succeed against a run that's still in the mollifier buffer. Mutations are applied to the buffered snapshot via Lua CAS; the drainer carries the mutation forward when it replays.
Primitives added:
mutateWithFallback— PG-first / buffer-fallback resolver with bounded-wait safety net for entries that transition mid-mutation.applyMetadataMutation— buffered metadata PUT mirroring the PG-side retry loop with CAS atomicity.resolveRunForMutation— discriminated-union resolver used by routefindResourceso the route builder's pre-action 404 check sees buffered runs.Routes wired (whole files, no GET/POST splits):
api.v2.runs.\$runParam.cancel.tsapi.v1.runs.\$runParam.replay.tsapi.v1.runs.\$runParam.reschedule.tsapi.v1.runs.\$runId.metadata.tsapi.v1.runs.\$runId.tags.tsresetIdempotencyKey.server.tsStacked on the reads PR.
Test plan